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Nearly  Optimal  State  Feedback  Controls  for 
Stochastic  Systems  with  Wideband  Noise  Disturbances 


by 

Harold  J.  Kushner  and  W.  Runggaldier 


Abstract 

")  Much  of  optimal  stochastic  control  theory  is  concerned  with  diffusion 
models.  Such  models  are  often  only  idealizations  (or  limits  in  an  appropriate 
sense)  of  the  actual  physical  process,  which  might  be  driven  by  a  wide  bandwidth 
(not  white)  process  or  be  a  discrete  parameter  system  with  correlated  driving 
noises.  Optimal  or  nearly  optimal  controls,  derived  for  the  diffusion  models, 
would  not  normally  be  useful  or  even  of  much  interest,  if  they  were  not  also 
’nearly  optimal’  for  the  physical  system  which  the  diffusion  approximates.  It 
turns  out  that,  under  quite  broad  conditions,  the  ’nearly  optimal’  controls  for  the 
diffusions  do  have  this  desired  robustness  property  and  are  ’nearly  optimal’  for 

the  physical  (say  wide  band  noise  driven)  process,  even  when  compared  to 

■h  -  &  ‘  •  >  t  t 

controls  which  can  depend  on  all  the  (past)  driving  noise.  We  treat  the  problem 
over  a  finite  time  interval,  as  well  as  the  average  cost  per  unit  time  problem. 
Extensions  to  discrete  parameter  systems,  and  to  systems  stopped  on  first  exit  from 
a  bounded  domain  are  also  discussed.  Weak  convergence  methods  provide  the 
appropriate  analytical  tools.  _ 


1.  Introduction 


The  paper  is  concerned  with  "approximately  optimal"  controls  for  a  wide 
variety  of  systems  driven  by  wide  band-width  noise,  and  their  discrete 
parameter  counterparts.  Consider  a  system  of  the  type 

(1.1)  x€  =  F£(x£,4€,u),  x  €  Rr,  Euclidean  r-space, 

where  t€(  )  is  a  wide  band-width  noise  process  (the  band-width  -*  ®  as  e  -*  0), 
and  the  cost  is 

(1.2)  R€(u)  =  E  [  1  k(x€(s),u6(s)), 

Jo 

for  some  Tx  <  <*>.  When  we  wish  to  emphasize  the  control,  we  write  the 
solution  to  (1.1)  as  x6(u€,.)- 

For  the  moment  (and  loosely  speaking)  suppose  that  (1.1)  is  ’close’  to  a 
controlled  diffusion  process,  modelled  by  (1.3),  in  the  sense  that  if  u€(-)  is  a 
sequence  of  ’nice’  controls  for  (1.1),  then  there  is  a  control  u(-),  and  a 
corresponding  controlled  diffusion  x(u, •)  defined  by  (1.3),  such  that  as  e  -*  0, 
x€(u6,  )  =>  x(u, -),  where  =>  denotes  weak  convergence  (see  the  next  section). 
Let  u(  )  denote  an  optimal  control  for  the  limit  diffusion  (1.3),  and  u8(  )  a 
’smooth’  8-optimal  control,  where  6  >  0. 

(1.3)  dx  =  b(x,u)dt  +  c(x)dw 

Now  apply  u6(  )  to  (1.1).  Under  fairly  broad  conditions,  it  is  shown  that 


(1.4)  inf  J^u)  >  R€(u8)  -  6 

u€RC£ 


for  small  e  >  0,  where  RCe  are  the  admissible  (relaxed)  controls  for  (1.1) 
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(see  Section  3).  Since  u6(.)  is  only  a  function  of  x  and  t,  it  would  be 
considerably  simpler  than  an  optimal  control  for  (1.1). 

The  methods  also  work  well  for  the  discrete  parameter  case 

(1.5)  x ^  =  x«  +  £F£(x«,^,un). 

The  {£*}  and  4e(-)  can  be  state  dependent,  and  there  are  straightforward 
extensions  to  the  discounted  cost  problem,  to  the  problem  where  the  process  is 
stopped  on  first  exit  from  a  set,  to  the  impulsive  control  problem,  and  to  the 
average  cost  per  unit  time  case. 

The  basic  technique  is  that  of  weak  convergence  theory  [1],  [2],  [3],  which 
will  be  seen  to  provide  a  very  natural  and  relatively  simple  basis  for  results 
of  the  type  presented  here.  The  relevant  background  results  are  listed  in 
Section  2.  In  Section  3  the  problem  on  a  finite  interval  [O.Tj]  for  a  form  of 
(1.1)  is  set  up,  and  the  assumptions  stated.  For  convenience  in  dealing  with 
the  weak  convergence,  as  well  as  to  minimize  detail  and  the  number  of 
hypotheses,  we  work  with  relaxed  controls.  The  relevant  estimates  and 
approximations  (the  "chattering"  lemma,  etc.)  are  also  stated  in  Section  3.  In 
Section  4,  the  results  for  the  finite  interval  are  proved.  Section  5  concerns 
the  discrete  parameter  case.  The  average  cost  per  unit  time  problem  is  in 
Section  6,  and  extensions  are  discussed  in  Section  8. 

A  related  problem  is  discussed  by  Bensoussan  and  Blankenship  in  [4],  [5]. 
They  deal  with  the  particular  non-degenerate  system 

dx€  =  f(x€,y€,u)dt  +  ✓ 2  dw 

(1-6) 

«dy€  =  g(x€,y€,u)dt  +  jTZ  dB, 

where  w(.)  and  B(  )  are  mutually  independent  standard  Wiener  processes.  The 
technique  in  [4,5]  concerns  an  asymptotic  expansion  of  the  Bellman  equation 
associated  with  the  optimal  control  of  (1.6).  These  expansions  are  hard  to 


-3- 


carry  out,  and  rely  heavily  on  various  non-degeneracy  properties  associated 
with  (1.6).  In  a  "linear-quadratic"  problem,  they  show  that  applying  the 
optimal  control  for  the  limit  problem  to  the  pre-limit  problem  gives  a  cost 
increase  of  0(c).  There  is  negligible  overlap  in  methodology  with  the  ideas 
here.  We  can  treat  (1.6)  if  g(  )  does  not  depend  on  u(  ). 

The  results  in  [4,5]  seem  to  require  an  analytical  approach,  rather  than 
our  purely  probabilistic  approach.  The  methods  used  here  seem  quite  simple 
in  comparison,  and  cover  a  broader  collection  of  problems.  Expansions  of  the 
value  functions  do  not  seem  to  be  obtainable  by  ur  methods.  On  the  other 
hand,  we  can  show,  for  many  typical  problem  formulations,  that  the  optimal 
or  8-optimal  control  for  the  limit  system  is  a  good  (nearly  optimal)  control  for 
the  system  which  is  driven  by  wide  band-width  noise.  Such  robustness  is  an 
important  part  of  the  statement  of  the  control  problem.  In  fact,  the  optimal 
or  nearly  optimal  controls  for  diffusion  models  would  not  usually  be  of 
interest,  were  they  also  not  good  controls  for  the  actual  physical  system  which 
is  ’idealized’  by  the  diffusion  model.  The  genera!  ideas  carry  over  to  more 
general  spaces  (e.g.,  to  measure  valued  processes). 
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2.  Weak  Convergence 

Let  Cr[0,®)  denote  the  space  of  Revalued  continuous  functions  with  the 
sup  norm  topology  on  bounded  intervals,  and  let  Dr[0,«)  denote  the  space  of 
Revalued  functions  vhich  are  right  continuous  and  have  left  hand  limits. 
Endow  Dr[0,»)  with  the  Skorohod  topology  [2],  Our  processes  (except  for  the 
discrete  parameter  case)  have  values  in  Cr,  but  it  is  easier  to  prove  tightness  in 
Dr,  and  then  to  show  that  all  limits  are  continuous. 

Let  F£  denote  the  minimal  a-algebra  over  which  (xe(s),fc£(t),  s  $  t)  is 
measurable,  and  let  E£  denote  expectation  conditioned  on  F£  Let  f(  )  be 
progressively  measurable  with  respect  to  (F£  }.  We  say  that  f(.)  is  in  £>( A6), 
the  domain  of  the  operator  A£  and  A£f  =  g  if  for  each  T  <  ® 

sup  E|g(t)i  <  ®,  E|g(t+6)  -  g(t)|  —  0  as  6  1  0,  each  t, 

t«T 


t~  |Ef  f(t+6)  -  f(t)  ,  j 

s?  eh — i - *<*>l  *  "• 

6>o 

_  |E«T(t+S)  -  f(t)  ,  j  „ 
lim  E  -* -  -  g(t)  -*  0,  each  t. 

R  in  1  r  I 


If  f(-)  €  D( A£)  then  ([3],  [6]) 


ft  ^ 

2.1)  f(t)  -  A£f(s)ds  is  a  martingale 

J  o 


»t-rt 

(2.2)  E£f(t+s)  -  f(t)  =  Ef  A£  f(u)du. 

J  t 

The  following  condition  for  tightness  in  Dr[0,«)  (Theorem  3.4,  [3])  is  a 
sufficient  condition  for  a  criterion  of  Aldous  and  Kurtz  [2],  Let  denote 
the  continuous  real  values  functions  on  Rr  with  compact  support,  and  £  £  the 


subset  of  functions  all  of  whose  mixed  partial  derivatives  of  order  up  to  k  are 
continuous. 


Theorem  0. 

Let  x€(.)  have  oaths  in  Dr[0,®)  and  let 

(2.3)  lim  lim  pjsup|x€(t)|  >  K.  1  =  0,  each  T  <  ®. 

K-®  €  \t*T  J 

For  each  f(.)  6  and  T  <  ®  let  there  be  a  sequence  f€(.)  €  Z)(A£) 
such  that  either  I  or  II  below  hold.  Then  {x e(  - ))  is  tight  in  Dr[0,®). 

I.  For  each  T  <  ®,  (A£f£(t),  e  >  0,  t  (  T}  is  uniformly  integrable  and  for 
each  a  >  0 


(2.4)  lim  pfsup|f£(t)  -  f(x£(t))|  »  a]  =  0. 

e  lt«T  J 

II.  (2.4)  holds  and  for  each  T  <  ®.  There  is  a  random  variable  B£(f) 
such  that 


sup  |A£f£(t)|  «  B£(f) 
t«T  1 

(2.5) 

lim  Hm  P{B£(f)  >  K}  =  0. 
K-®  e  1 


Consider  a  discrete  parameter  case 


n+1 


=  <  +  F  Ax'tf). 


Let  F£  denote  the  minimal  a-algebra  over  which  (x£,  j>  i  <  n)  is  measurable, 
with  E£  denoting  the  associated  conditional  expectation.  We  say  that  f(-)  € 
D(Ae)  if  it  is  constant  on  each  [ne,  ne  +  t)  interval,  f(ne)  is  F£-measurable,  and 


sup  E|f(ne)|  <  ®.  Then  we  define 

n 


A6f(ne)  =  [E*f(n£+e)  -  f(n£)]/£, 

and  the  discrete  parameter  analogs  of  Theorem  0  and  (2.1),  (2.2)  hold.  In 
particular  for  f  £  D(Ae), 

,  n-tm- 1  ,  a  , 

E«f(ne  +  me)  -  f(ne)  =  e  £  E«  Aef(k). 

i=n 


Let  M(®)  denote  the  collection  of  measures  (m(-)}  on  the  Borel  subsets  of  U 
x  [0,®),  where  U  is  compact  and  m([0,t]  x  U)  =  t,  for  all  t  ^  0.  We  will  be 
working  with  weak  convergence  of  a  sequence  of  M(®)-valued  random 
variables.  Topologize  M(®)  as  follows.  Let  {fni(-X  i  <  ®)  be  a  countable  dense 
(sup  norm)  set  of  continuous  functions  on  [0,n]  x  U.  Let  (m,f)  =  J  f(s,a)m(ds 
x  da),  and  define 


where 


d(m',m")  =  £  2'ndn(m ',m"), 

n=t  n 


dn(m',m") 


®  2-il(m'-m",fni)j 
ill  l+|(m  '-m",fni)| 


When  we  say  that  mn(  )  =>  m(-)  for  a  sequence  of  random  measures,  we 
always  mean  weak  convergence  in  M(®). 


3.  Assumptions  and  Relaxed  Controls 


We  adopt  a  particular  noise  model  which  is  a  standard  way  of  modelling 
wide  band-width  noise.  The  model  can  readily  be  generalized,  since  only  a 
few  properties  of  the  processes  are  used.  The  model  is  convenient  also  because 
the  relevant  weak  convergence  results  can  be  easily  referred  to.  A  control 
u(.)  for  (1.1)  is  said  to  be  admissible  if  it  takes  values  in  U,  a  compact  set, 
and  it  is  progressively  measurable  with  respect  to  the  tr-algebras  aU€(s),  s  $ 
t). 

A  random  measure  m(.)  with  values  in  M(®)  is  said  to  be  an  admissible 
relaxed  control  if  f(s,a)m(ds  x  da)  =  (f,m)t  is  progressively  measurable  with 
respect  to  {f*}  for  each  bounded  continuous  f(-)-  If  m(-)  is  admissible,  then 
there  is  a  measure  valued  function  mt(.)  of  (iu,t)  such  that  for  smooth  f(-) 

J  f(s,a)m(ds  x  dm)  =  J  dt  J  f(s,a)m8(da), 

and  mt(.)  is  (weakly)  progressively  measurable  in  the  sense  that  ds 
Jf(s,a)mg(da)  is  progressively  measurable.  Let  AC6  and  RC€  denote  the  class 
of  admissible  and  admissible  relaxed  controls,  repsectively,  for  (1.1). 

Assumptions 

Al.  t€(t)  =  Ut/c2),  where  ^(-)  is  a  stationary  zero  mean  process  which  is 
either  (a)  stronely  mixing*,  right  continuous  and  bounded,  with  the  mixing 
rate  function  <t>(.)  satisfying  J®  <t>1^2(s)ds  <  ®  or  (b)  stationary  Gauss-Markov 
with  an  integrable  correlation  function  (which  thus  must  go  to  zero 
exponentially). 


oU(v),  v  <sl  B  oU(v),  v  1  s+t),  sup|P(B|A)  -  P(B)|  S  <t>(s). 

A,B 


*  I.c.,  for  A 


A2.  F£(x,t,u)  =  b(x,u)  +  b(x,t)  +  g(x,0/«,  where  Eb(x,0  =  Eg(x,$)  =  0  under 
(Ala),  and  g(x,$)  =  g(x)$,  t5(x,$)  =  f>(x)4  under  (Alb).  k(.,.)  is  bounded  and 
continuous,  and  b(-,-)>  f>(-,-).  g(-,-)  are  continuous.  The  derivative  gx(-,0  is 
continuous  (in  x,4).  Also  b(-,a)  satisfies  a  linear  growth  condition  and  a 
Lipschitz  condition  in  x,  uniformly  in  a  €  U.  Under  (Ala),  E(-,0,  g(-,0> 
gx(-,0  satisfy  the  same  uniform  Lipschitz  and  growth  condition,  and  under 
(Alb),  b(-),  g(-)  and  gx(.)  do. 

Define 

(a^x)}  =  j"<nEg(x,t(t))g'(x,t(0))dt  =  a(x), 

b;(x,u)  =  b;(x,u)  +  f  E  f  g.  (x,4(t))gi(x,5(0))dt,  i  <  r. 

*  1  J  o  f  *  *j  J 

A3.  Suppose  that  (a^-)}  has  a  Lipschitz  continuous  square  root  o(  ). 

For  the  problem  on  [0,Tj],  the  boundedness  condition  on  k(.,.)  can  be 
replaced  by  a  polynomial  growth  condition.  For  the  average  cost  per  unit  time 
problem,  the  stability  methods  and  assumptions  of  Section  7  can  be  used  for 
the  same  purpose. 

The  weak  convergence  and  existence  (of  an  optimal  control)  arguments 
are  easier  if  one  works  with  relaxed  controls.  It  is  convenient  to  work  with 
relaxed  controls  on  [0,®).  If  the  control  problem  is  of  interest  on  [0,TX]  only, 
then  define  u(-)  or  m(-)  in  any  admissible  way  on  [Tj,®). 

Admissible  controls  for  (1.3)  qt  (3.1)  below.  An  admissible  control  for 
(1.3)  is  any  U-valued  function  u(-)  which  is  non-anticipative  with  respect  to 
w(  ).  An  admissible  relaxed  control  for  (1.3)  or  (3.1)  below  is  any  M(®)  valued 
random  variable  m(  )  such  that  for  any  collection  (fp(  )}  of  bounded 
continuous  functions  fp(  ),  and  each  t  >  0,  { Jg  f^(s,ot)m(ds  x  da))  is 
independent  of  (w(t+s)  -  w(t),  s  >0).  If  m(.)  is  an  admissible  relaxed  control 


then  there  is  a  («u,t-dependent)  measure  mt()  on  the  Borel  sets  of  U  such  that 


J  f(s,a)m(ds  x  da)  =  J*  ds  J  f(s,a)ms(da),  t  <  ®, 

for  each  bounded  and  continuous  f(-)  and  almost  all  ux  When  working  with 
(1.3)  or  (3.1),  we  assume  that  b(  )  and  o(-)  have  the  continuity,  growth  and 
Lipschitz  conditions  ascribed  to  b(-)  and  cr(  - )  in  (A1)-(A3).  Let  AC  and  RC 
denote  the  class  of  admissible  and  admissible  relaxed  controls,  respectively. 

Theorem  1. 

Let  m(-)  be  an  admissible  relaxed  control  (with  respect  to  a  Wiener  oroces 
w(  )).  Then  there  exists  a  non-anticioative  solution  to 


(3.1)  dx  =  dt  J  b(x,a)mt(da)  +  c(x)dw,  x(0)  =  x, 
and 

(3.2)  E  sup  |x(t)l2  <  K[1  +  |x|2J, 

t<T 

where  K  depends  only  on  T  and  on  the  growth  rates  and  Lipschitz  constants 


E 


jliajiM 


T  and  on  the  Lioschit 


Then  there  is  a  -»  0  a£  A  -*  0  (g 
z  and  growth  constants)  such  that 


E  sup  |x“(t)  -  x(t)|2  <  K.(l  +  |x|2) 

t*T  a 


(Ka  does  not  depend  on  m(  ■ ).) 

Let  mn()  =>  n(),  where  the  mn(-)  are  admissible  with  respect  to  some 
Wiener  process,  and  let  xn(  )  satisf v  (3.1)  with  m()  =  mn.  Then  (xn(-),mn()) 
=>  (x(-),n(  ))  where  x(.),n(-)  satisf v  (3.1)  for  some  Wiener  process  w(  )  and 
m(-)  is  admissible  with  respect  to  w(.). 

Proof.  The  existence  and  uniqueness  proof  for  the  relaxed  control  case 
follows  the  same  (standard)  lines  as  when  an  admissible  control  u(o>,t)  is  used, 
and  is  discussed  by  Fleming  [7]  and  Fleming  and  Nisio  [8],  The  proofs  of  the 
estimates  (3.2),  (3.4)  also  follow  the  classical  lines.  To  get  the  weak 
convergence  in  the  last  paragraph,  it  is  sufficient  to  work  with  the  discrete 
parameter  case  (3.3),  in  view  of  the  uniformity  (in  m(.))  of  K  and  K^.  But 
the  result  is  obvious  for  the  discrete  parameter  case,  owing  to  the  continuity 
of  b(  - ,  - )  and  the  Lipschitz  conditions  and  linear  growth  conditions.  Q.E.D. 

For  (3.1),  define 


R(m)  =  |Q  |  k(x(s),a)mI(da)ds 


where  x(  )  corresponds  to  m(-)  via  (3.1).  We  sometimes  write  the  solution  to 
(1.3)  or  (3.1)  as  x(u,  )  or  x(m,-). 


Theorem  2. 


Is  for  (3.1),  there  is  an 


s 


Proof.  The  theorem  follows  from  Theorem  1.  Simply  choose  a  weakly 

convergent  subsequence  m8(),  8  -»  0,  such  that  R(m8)  -*  inf  R(m)  =  R.  Denote 

m  SRC 

the  limit  of  (x(m8,  ),m8(.)}  by  (x(m  ,.),m  (•))■  Then  by  Theorem  1,  m(  )  is 
admissible  for  some  Wiener  process  w(.)  and  (x(m  ,.),m  (  ),w(.))  solve  (3.1). 
By  the  weak  convergence, 

T  T 

E  |  1J,k(x8(s),o)m8(dsxd«)  -»  E  J  1  Jk(x(s),ct)n(dsxda)  =  R  =  R(m) 

Q.E.D. 

Since  we  wish  to  show  (in  the  following  sections)  that  any  smooth  and 
nearly  optimal  feedback  control  for  (1.3)  is  a  nearly  optimal  control  of  (1.1) 
for  small  e  >  0,  it  is  important  to  know  that  there  is  a  smooth  nearly  optimal 
control  for  (1.1).  This  is  shown  in  the  next  two  theorems. 

The  chattering  lemma. 

Theorem  3.  For  each  8  >  0,  there  is  a  piecewise  constant  admissible  control 
u8(-)  for  (1.3)  such  that 

R(u8)  *  inf  R(m)  +  8. 
m  6  RC 

Remark.  A  proof  is  in  [7],  [8],  We  only  give  a  rough  outline  of  the 
construction.  Let  m()  be  an  optimal  admissible  relaxed  control.  Let  u£,  ..., 
u£  be  a  p-grid  in  U.  Define  A{*  by  Af  =  (a  £  U:  loru^  *  p).  For  k  £  n  >  1, 
define 


AP  =  (a  €  U:  |cruP|  <  p}  -  V  Af. 


For  A  >  0  and  i  »  0,  define 


*  iA-fA  _ 

tA<>  =  m«(Aj)ds, 

J  iA 

the  total  integrated  time  that  the  optimal  relaxed  control  ’takes  values’  in  the 
set  Ap  in  the  time  interval  [iA,iA+A).  Define  the  piecewise  constant  admissible 
control  u8(.)  by  u8(t)  =  u§  for  t  <  A,  where  u£  is  anv  value  in  U;  in  general, 
set  u8(t)  =  up  on 


[ (i+l)A  +  p,  (i+l)A  +  £  PJ,  i  »  0,  n  <  k. 


Then,  for  small  p  and  A,  u8(-)  satisfies  our  needs,  even  though  the  intervals  of 
constancy  are  random. 

We  can  also  get  a  control  whose  intervals  of  constancy  are  non-random.  Let 
Aj  >  0  be  such  that  A/Ax  s  k  is  a  large  integer,  and  write  k^p  =  p/Ax]. 
Then  define  u8(.)  as  u8(-)  was  defined  but  with  k^p  Aj  replacing  t^p,  and 
on  the  non-assigned  set,  simply  set  u8(t)  =  u£,  where  u^  is  anv  value  in  U.  For 
small  Aj,  A  and  p,  and  large  k,  u8(.)  also  satisfies  our  needs. 


Theorem  4.  For  each  6  >  0,  there  is  a  piecewise  constant  (in  t)  and  locally 
Lioschitz  continuous  in  x  (uniformly  in  t)  control  u8(  )  such  that 


R(u8)  <  inf  R(m)  +  8. 
m  €  RC 

Proof.  Fix  8  >  0.  By  the  previous  theorem,  we  can  find  a  A  >  0  and  an 
admissible  control  u8(  ),  constant  on  each  interval  [iA,iA+A),  and  such  that 

R(u8)  <  inf  R(m)  +  8/4. 
m  £  RC 


By  examining  the  imbedded  Markov  chain  (x(iA),  iA  <  Tt},  we  see  that  there  is 
an  admissible  control  u8(t)  which  is  piecewise  constant  and  has  the  form  u8(t) 


=  u8(x(iA),  iA)  for  t  £  [iA,  iA+A)  for  some  function  u8(x,t),  and  is  such  that 


R(u8)  <  R(u5). 

In  fact  we  can  suppose  that  the  u8(t)  take  only  a  finite  number  of  values 
u,,  uk,  where  k  might  depend  on  8  but  not  otherwise  on  A.  Let  x(-) 
denote  the  process  corresponding  to  the  control  u6(  ).  Define  Bj  =  {x: 

u8(x,iA)  =  U|}.  There  are  open  sets  Bl|With  smooth  boundaries  (say,  unions  of  a 

finite  number  of  spheres)  and  whose  closures  are  disjoint  and  such  that  (dB 

denotes  the  boundary  of  the  set  B) 

P{x(iA)  €  SB1,}  =  0,  all  i,l,  iA  <  Tj, 

(3.5) 

T'V  P(x(iA)  £  U  (B|AB'j)}  <  /[I  +  sup|k(x,o)|]  . 

i=0  I  *  4T  x,  a 

For  each  i,  define  u8(x,iA)  to  equal  Uj  on  B|,  and  use  any  locally  Lipschitz 

continuous  interpolation  for  x?  U  B1..  Thus  the  costs  with  use  of  u8(.)  (on 

x.  1 

one  hand)  and  use  (on  the  other  hand)  of  u8(x(iA),iA)  for  t  £  [iA,iA+A)  and  each 
i  differ  by  at  most  8/2.  In  fact  the  latter  control  and  u8(  )  differ  on  a  set 
whose  probability  is  less  than  8/2  plus  the  right  side  of  (3.5).  Define  u8(x,t)  = 
u8(x,iA)  for  t  (iA,iA+A). 

For  small  A,  the  u8(.)  satsify  our  needs.  Q.E.D. 
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4.  Weak  Convergence  of  and  Approximation  of  the  Optimal  Controls 
for  x€(-) 

In  this  section  we  work  with  the  control  problem  on  [O.TJ  and  prove 
(Theorem  5)  that  the  weak  limit  of  any  (weakly  convergent)  sequence  of 
admissible  relaxed  controls  for  (4.1)  is  an  admissible  relaxed  control  for  (3.1) 
and  that  the  corresponding  costs  converge.  Then,  in  Theorem  6,  we  show  that 
any  smooth  ’nearly  optimal  1  feedback  control  for  (3.1)  also  is  'nearly 
optimal'  for  (4.1)  for  small  e. 

Let  6f  -♦  0,  and  let  nf(-)  be  a  6£-optimal  admissible  relaxed  control  for 
the  process  defined  by 


xe  =  Jb(x€,a)mt(d<x)  +  b(xe,t6)  +  g(xe,{£)/e. 


with  cost  function  (1.2).  For  convenience,  we  define  all  m(  )  on  [0,®).  In  the 
analysis  below  it  is  convenient  (but  not  necessary)  to  have  J  b(xe(s),a)mt(d«) 
right  continuous  (in  order  to  be  able  to  readily  evaluate  A6).  Owing  to  the 
Lipschitz,  continuity  and  growth  conditions,  for  each  £  we  can  suppose 
(w.l.o.g.)  that  m£(),  is,  in  fact,  constant  on  intervals  [iA£,iA£+A£)  for  small 
enough  A£. 

Define  Lm,  the  infinitesimal  operator  of  x(m,-)  defined  by  (3.1),  by 

Lmf(x)  =  f^x)  Jb(x,a)mt(da)  +  -  .E.  fx.x.(x)aij(x)- 

Theorem  5.  Assume  (A1WA3).  Then  fx€(nf ,-).nf (.)}  is  tight  in  DrfO.®l  x  M(«T 
Let  m€(.)  =>  m(.).  There  is  a  w(.)  such  that  m(-)  is  admissible  with  respect 
to  w(.)  and  (x€(nf , ■ ),  n^( •))  =>  (x(m, •)>"1( •)),  where 


dx  =  dt 


J  b(x,a)m  t(do 


:)  +  a(x)dw. 


V. 

*  >  *  M >  .  *  . 
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T 

R6(m£)  =  E  |  |  k(x£(s),a)m£(ds  x  da) 

-*  E  |  |  k(x(s),a)m(ds  x  da)  =  R(m) 

Proof.  We  first  work  with  a  truncated  system,  since  tightness  is  easier  to 
prove  if  the  x£()  paths  are  all  bounded  (see,  e.g.,  [3],  Chapter  3.3  or  4.6.4  or 
[9]).  Let  qN()  be  a  twice  continuously  differentiable  function  satisfying  qN(x) 
=  1  for  |x|  <  N,  qN(x)  =  0  for  |x|  »  N+l  and  qN(x)  [0,1]  for  all  x.  Define 
bN(x,a)  =  b(x,a)qN(x),  gN(x,0  =  g(x,OqN(x),  etc.,  and  let  x£-N(.)  denote  the 

solution  to  (4.1)  corresponding  to  the  use  of  bN,  bN,  gN,  and  m  £(.). 

Part  I.  Tightness  of  (x£|N(l) 

Since  U  x  [O.tJ  is  compact  for  each  tj  <  ®,  {nf(.)>  is  tight  in  M(»).  To 
prove  the  tightness  of  (x£,N(.)},  we  use  the  first  order  perturbed  test  function 
method  of  [3,  Chapter  3]  (see  also  [9]).  Let  f(.)  €  £  *.  Then  (write  x  for 
x£,N(t)  for  convenience) 

Aef(x)  =  fx(x)  [  J  bN(x,a)m£(da)  +  bN(x,fcE(t))  +  gN(x,te(t))/t  J1 

For  arbitrary  T  <  «®  and  for  t  <  T,  define  f£(t)  =  f£(x€’ N(t),t),  where 
fT 

f£(x,t)  =  fJJ(x)E£gN(x,^e(s))ds/e 
J  t 

fT/€2 

=  €  I  fj!(x)Ete  gN(x,^(s))ds. 

J  t/62 

Under  (Ala),  f£(t)  =  O(c).  Under  (Alb),  f£(t)  =  0(e)|t€(t)|.  In  either  case 
.  P 

sup  |ff (t)|  -•  0  as  €  -•  0. 
t<T  1 


We  have 


A£ff(t)  =  -f,;(xe-N(t))  gN(xe'N(tU€(t))/€ 

1  fT 

+  -  ds[f3J(xe'N(t))Ej  gN(x£,N(t),t€(s))])J  x£,N  (t). 

£  h 

Define  f£(t)  =  f(x£,N(t))  +  f£(t).  Then,  writing  x  for  x£,N(t),  using  the  above 
results  and  a  scale  change  s/c2  -»  s, 

(4.3)  A£f£(t)  =  f;(x)  JbN(x,«)i?t£(do)  +  fJ|(x)bN(x,t£(t)) 

fT/€2 

+  ds  E£[f'(x)gN(x,Us))]'  gN(x,t£(t)) 

Jt/£2 

T/e2 

+  *\  j  2  ds  Ef[fJ»(x)gN(x,*(S))]1»  [1  b(x,a)m£t(da) 

+  b(x,{£(t))J. 

Under  (Ala),  the  second  and  third  terms  in  (4.3)  are  0(1).  Under  (Alb),  they 
are  0(1)[1  +  U£(t)|2].  Under  (Ala),  the  last  term  is  0(0,  and  under  (Alb)  it  is 
0(0[1  +  K£(t)|2].  In  either  case  the  conditions  of  Theorem  0  hold.  Hence 
(x€,N(-)}  is  tight  in  Dr[0,®)- 


Part  2. 


iilfcllBimM  qjutJ  <  iTh 


Let  £  index  a  weakly  convergent  subsequence  with  limit  denoted  by  xN( -), 
n(  ),  i.e.,  (x£,N(  ),  nf(.))  =>  (xN(  ),  m(  )).  There  is  an  (iu,t)-measurablc  mt(  ) 
such  that  mt(U)  =  1  and 


[  f  f(s,a)mg(da)ds  =  [  [  f(s,a)m(ds  x  da) 


for  each  continuous  f(  ).  This  is  a  consequence  of  the  fact  that  m{A  x  [0,t]} 
is  absolutely  continuous  for  each  Borel  A,  uniformly  in  <u,A,  which  implies 
that  the  (measurable)  limit 

lim  [m{A  x  [0,t]}  -  m{A  x  [0,t-A]}]/A  =  mt(A) 

A 

exists  for  a.a.  (<o,t)  for  each  Borel  A. 

Define  L as  Lm  was  defined,  but  with  the  use  of  bN  and  gN  instead  of  b 
and  g.  Let  f(  )  €  £2  and  define  M^(.)  by 

»t  A 

M*(t)  =  f(xN(t))  -  f(x(0))  -  L™  f(xN(s))ds. 

We  next  show  that  M^(-)  is  a  martingale  with  respect  to  BN(t)  =  a{xN(s), 
m( A  x  [0,s]),  Borel  A.  s  <  t}. 

We  know  that  xN(  )  has  paths  in  Dr[0,®),  but  we  haven't  yet  proved  that 
the  paths  are  in  Cr[0,®).  There  are  at  most  a  countable  set  of  t-points  such 
that  P{xN(  )  is  discontinuous  at  t}  >  0.  Denote  this  set  by  {t;}.  In  what 
follows,  until  continuity  is  established,  the  t;,  t,  t+s  do  not  take  values  in 
Let  h(-)  be  bounded  and  continuous  and  let  tj  <  t  <  t+s.  Let  qx  and  q2  be 
arbitrary  integers  and  kj(-)  arbitrary  bounded  and  continuous  functions.  By 
(2.1),  (2.2),  and  a  change  of  scale  (s/t2  -*  s)  for  one  of  the  terms,  we  have 

(4.4)  Eh(x6'N(ti),(kj,me)tj,  i  <  qv  j  *  q2){f(x£-N(t+s)) 

-  f(x£-N(t))  +  f  J  (t+s)  -  f£(t) 

-  £  |  f3J(xe'N(T))bN(x6’N(T),«)m6(dT  x  da) 

*  t-t®  _ 

-  Et£  f,|(xe'N(T))bN(X6'N(T),t€'N(T))dT 

^  t 

rt-Hi  rT/€2 

-  dr  E£  [f^(x£-N(T))gN(x6'N(T)  , 

J  t  J  t/  £  2 

t(v))]x'gN(X€’N(T),t€(T))dv 


+  terms  which  go  to  0  in  mean  as  €  -*  0)  =  0. 


Owing  to  (2.1)  and  (2.2),  (4.4)  holds  with  or  without  the  E£  term  on  the  right 
hand  side.  Recall  that  (f,m)t  =  £1  f(s,a)m(ds  x  da). 

Now  take  limits  (e  -*  0)  in  (4.4)  and  use  Skorohod  imbedding  ([10], 
Theorem  3.1.1).  The  imbedding  allows  us  to  define  the  probability  space  so 
that  the  weak  convergence  becomes  w.p.l.  in  the  topology  of  the  space  Dr[0,®) 
x  M(®).  We  use  the  imbedding  without  changing  the  notation,  where 
convenient.  The  f£  terms  in  (4.4)  disappear  as  e  -»  0.  Also  by  the  weak 
convergence  and  Skorohod  imbedding, 

t-ffl  0  ^ 

j  bN(x£'N(T),o)m£(dT  x  do)  -  J  j  bN(xN(T),a)m(dT  x  da), 

(kj,me)t  -  (kj,m)t, 

w.p.l.,  uniformly  on  each  finite  interval.  Next  consider  the  second  integral 
term  in  (4.4).  We  will  show  that 

_ 

(4.5)  HmE  |j  E£  b  N(xc’  n(t),  t6(r))dr|  =  0. 

Since  (x€,N(-))  is  tight  in  Dr[0,®)  it  is  essentially  a  right  equicontinuous  set  in 
the  following  sense.  Given  p  >  0  and  T  <  ®,  there  is  a  compact  set  fip  <= 
Dr[0,T]  such  that 

P(x€-N(.)  £flp}H-p. 


For  y(-)  E  Dr[0,T],  define  wy[a,b)  =  sup{|y(s)  -  y(t)|:  s,t  [a,b)}  and  define 

Wy(8)  =  it)  Wy^t*’t'+i)’ 

q 

where  0  =  tQ  <  ...  <  t  =  T  and  ti41  -  tj  »  8. 

Then  [1,  p.  116] 


(4.6) 
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lim  sup  w'(6)  =  0 

6  y(  • )  €  y 

Because  of  this  'equi  rightcontinuity  •  characterization,  to  get  the  limit  (4.5) 
it  is  sufficient  to  evaluate 

_  » t-rt 

lim  lim  El  E*  bN(xe' n(t-A),  t6(T))dT| 

AiO  £  'Jt 

_  »t-K  _  _ 

(  lim  lim  El  Ef.  Ab(xe  ■  N(T-A),te(-r))dT  I  =  lim  lim  Kf. 

AAO  e  '  Jt  T'a  1  AiO  e  a 

There  are  constants  CN  and  depending  only  on  N  such  that,  under 
(Ala) 

|E^.Ab(x€’N(T-A),  te(r))|  <  CN*( A/e2), 
and  under  (Alb) 

|E'.Ab(xe'N(T-A),4€(T))|  <  Ctfexp  -  XA/e2]|*€(t-A)|, 

where  <K  )  is  the  mixing  rate  (Ala)  for  {(.),  and  exp  -  xt  is  a  bound  on  the 
norm  of  the  correlation  matrix  (under  (Alb)).  Thus,  under  (Ala),  lim  KA  = 
0  for  each  A  >  0.  Under  (Alb)  KA  *  0(exp  -  XA/e2)  JJ*|*e(T)|dT.  Thus  (4.5) 
holds. 

By  a  very  similar  technique  we  can  show  that,  as  e  -»  0,  the  double  integral 
term  in  the  brackets  in  (4.4)  converges  (in  mean)  to 

m t*fS  0 ® 

(4.7)  dr  E[f3J(xN(r))gN(xN(T),5(s))]j;gN(xN(T),t(0))ds. 

4 1  Jo 


The  expectation  in  (4.7)  is  over  the  ^(-)  only.  The  xn(t)  is  considered  to  be  a 
fixed  parameter  when  taking  the  expectation.  This  last  limit  result  is,  in  fact, 
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a  special  case  of  [3,  Theorem  5.11].  Thus 

(4.8)  Eh(xN(ti),(kj,m)t  ,  i  S  qt,  j  «  q2)  .  [f(xN(t+s))  -  f(xN(t)) 

•  a 

f(xN(T))dr]  =  0 

■*  t 

Since  qp  q2,  h(.)  and  the  kj(-),  t;,  t,  s  are  arbitrary  (with  t;,  t,  t+s  ?  2/~=  {t;}), 
the  assertion  that  the  M^(  -)  are  (fiN(t)}  martingales  is  proved. 

It  follows  from  the  fact  that  xN(  )  solves  the  martingale  problem  in 

A 

Dr[0,«)  associated  with  the  local  operator  L ^  that  xN(.)  has  continuous  paths 
w.p.l. 

Part  3.  Representation  of  the  limit 

Define  ctn(x)  =  a(x)qN(x).  Since  the  M^(-)  are  martingales  with  respect  to 
flN(t),  there  is  a  standard  Wiener  process  wN(.)  (augmenting  the  probability 
space  if  necessary,  via  the  addition  of  an  independent  Wiener  process  if  a(-) 
is  degenerate)  such  that  wN(t)  is  BN( t)  adapted,  xN(-)  is  nonanticipative  with 
respect  to  wN( . )  and 

(4.9)  dxN  =  dt  |  bN(xN,a)mt(d«)  +  cN(xN)dwN. 

Also,  since  wN(.)  is  BN( t)  adapted,  the  in  (A  x  [0,t])  and  in  t(A)  are  non¬ 
anticipative  with  respect  to  wN(.).  Hence  in(  )  is  an  admissible  relaxed 
control  for  the  problem  with  coefficients  bN,  ctn. 

Define  tn  =  min{t:  |xN(t)|  >  N).  Let  w(-)  be  any  Wiener  process  such  that 

A 

m(-)  is  non-anticipative  with  respect  to  w(-).  For  this  pair  (4.2)  has  a  unique 
solution  whose  distributions  do  not  depend  on  the  particular  w(  )  (and  with  no 
explosion  w.p.l.  on  any  bounded  time  interval).  So  does  the  system  (4.9)  with 
wN(.)  replaced  by  w(.).  Replace  wN(.)  in  (4.9)  by  w(.).  Then  the  sets  (xN(t 
n  tn),  m(A  x  [0,t]),  Borel  A,  t  <  »}  and  {x(t  fl  tn),  in(A  x  [0,t]),  Borel  A,  t  <  ®) 


have  the  same  distributions.  Since  P{tn  $  T}  -*  0  as  N  -♦  ®  for  each  T  <  ®,  we 
then  have  that  {x€(,),m€(  • )}  is  tight  and  converges  weakly  to  a  solution  of 
(4.2). 

The  last  assertion  of  the  theorem  follows  from  the  weak  convergence 
(x€(  ■  ),m€(  • ))  =>  (x(-),m(.)),  and  the  continuity  of  the  process  x(-).  Q.E.D. 

Remark.  With  a  simpler  proof  (not  requiring  working  with  (me(-)})  we 
have  the  following.  Let  u(-)  be  a  (time-dependent)  feedback  control  which  is 
continuous  in  x,  uniformly  in  t  on  each  bounded  (x,t)  set,  and  for  which  the 
martingale  problem  associated  with  (1.3)  has  a  unique  solution.  Then  x€(u,  ) 
=>  x(u,  ).  Also  R6(u)  -*  R(u). 

Theorem  6.  Assume  (A1)-(A3).  Then  for  each  6  >  0,  there  is  a  Lipschitz 
continuous  (uniformly  in  t)  control  u8( . )  such  that 


(4.10)  lim[Re(u8)  -  inf  £Re(m)]  <  8. 

€  m  £  RC* 


Proof.  Use  the  u8()  of  Theorem  4.  By  the  weak  convergence  argument 
of  Theorem  5,  x€(u8,-)  =>  x(u8,-)  and  RE(u8)  -*  R(u8).  The  theorem  follows 
from  this  since 


R€(m€)  -»  R(m)  £  inf  R(m)  ^  R(u8)  -  8 
m  £  RC 


:«»»: 
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S.  The  Discrete  Parameter  Case 

An  advantage  of  the  weak  convergence  point  of  view  is  that  the  discrete 
parameter  case  can  be  treated  in  almost  the  same  way  as  the  continuous 
parameter  case. 


Let  the  system  be  given  by 


(5.1)  x«+1  =  x«  +  € 


jb(x«,a)mni 


(da)  +  eb(xj;,tn)  +  vc  g(x*,4n). 


where  {£n}  satisfies  the  discrete  parameter  form  of  (Ala)  or  (Alb)  and  the 
conditions  on  g(-),  b(-),  b()  and  k()  in  (A2)-(A3)  hold.  Also,  assume  that  the 
discrete  parameter  relaxed  control  mn()  depends  on  Uj.pXj,  j  <  n)  only.  For 
any  admissible  relaxed  control  m(-)  for  (3.1),  define  the  infinitesimal  operator 
Lm  by  (which  implicitly  defines  b(  )  and  er(  )) 

Imf(x)  =  P(x)  |b(x,«)mt(da) 

(5.2)  +  ^  E  E[fJ|(x)g(X,^n)J3;  g(x,40) 

=  f'(x)  |  b(x,a)mt(da)  +  -,E.fx.x.  (x)aij(x). 

The  discrete  parameter  case  can  easily  be  put  into  the  framework  of  the 
last  section.  The  optimal  policy  for  the  discrete  parameter  case  would  not 
usually  be  'relaxed',  but  it  is  convenient  to  represent  it  as  a  relaxed  control, 
since  the  limit  controls  might  be  relaxed.  Define  x€()  by  xe(t)  =  xjj  on 
[ne,n€+e),  and  define  m(  )  by 


(5.3)  m(A  x  [0,t])  =  t  'm^A)  +  -  €[t/c])m{  t/€]  (A.). 


^  #  .  ■ 
.v/- 


Let  8£  -»  0  and  let  m  e(.)  be  a  6£-optimal  control  for  (5.1). 


Theorem  7.  Under  the  conditions  of  this  section.  Theorems  5  and  6 
hold  for  the  discrete  parameter  case. 

Remark.  The  proof  is  nearly  identical  to  that  of  Theorems  5  and  6.  One 
uses  the  discrete  parameter  versions  (in  [3])  of  the  theorems  which  were  cited 
to  that  reference  and  the  definition  of  Aef(ne)  and  E*  given  in  Section  2. 


6.  Average  Cost  Per  Unit  Time 


In  this  section,  (x£( -)>46( -))  will  be  a  Markov-Feller  process  with  a 
stationary  transition  function  when  the  control  is  of  the  feedback  form  u(x,£), 
and  4(-)  is  a  Markov-Feller  process.  Let  PM  denote  the  class  of  U-valued 
functions  of  x  for  which  (1.3)  has  a  unique  (weak  sense)  solution  for  each 
initial  condition,  and  let  PM6  denote  the  class  of  U-valued  continuous 
functions  of  (x,£)  for  which  the  corresponding  (x£(.),$£(-))  is  a  Markov-Feller 
process  (e.g.,  PM6  includes  all  U-valued  locally  Lipschitz  continuous 
functions).  We  work  with  (6.1),  the  same  system  dealt  with  in  the  previous 
section. 

(6.1)  x£  =  b(x£,u)  +  b(x€,*£)  +  g(xe,t6)/6. 

Let  SR  denote  the  class  of  stationary  admissible  relaxed  controls  for  (3.1)  such 
that  for  each  m(-)  €  SR,  there  is  a  process  x(m,-)  where  the  pair  (x(m,  ),m(-)) 
is  stationary,  and  define  SR£  analogously  for  (6.1).  When  writing 
infm  £  Srf(x( -))  f°r  some  function  F(  ),  we  infimize  the  functional  values 
over  these  stationary  pairs  (x(m,-),m(  )). 

The  cost  function  (for  a  relaxed  admissible  control)  is 

T 

lim  —  |  j Ek(xe(t),a)mt(d«)dt  =  y£(m) 

and,  for  a  feedback  control, 

T 

lim  —  [  Ek(x€(t),u(x€(tU€(t))dt  =  y£(u). 

T  T  J0 

We  define  the  costs  y(u)  and  y(m)  for  the  controlled  diffusion  x(  )  in  the 
analogous  way. 

It  is  convenient  to  start  our  analysis  with  some  additional  assumptions. 
They  will  be  discussed  and  sufficient  conditions  given  for  them  in  the  next 


section. 


(C1)-(C4)  hold  in  very  many  cases  of  interest.  (Cl)  and  (C3)  are  basically 
uniform  (in  the  control)  recurrence  conditions.  They  certainly  hold  if  the 
x£(t)  are  confined  to  a  compact  set.  But,  more  generally,  if  the  system  has  a 
stability  property  for  large  |x|,  then  it  can  often  be  exploited  to  get  (Cl)  and 
(C3).  See  Section  7.3.  Also,  a  nearly  optimal  stabilizing  control  for  (1.3)  is 
often  a  stabilizing  control  for  (6.1). 

Cl.  There  is  e0  >  0  such  that  for  each  8  >  0,  there  are  6-OPtimal  controls 
u£,8(-,)  £  PM€  such  that  (x£(u£,8,t),  t  <  <*>,  e  <  e0)  is  tieht  in  Rr. 

C2.  For  each  8  >  0,  there  is  a  continuous  6-optimal  control  u8(  • )  in.  PM  f^r 
(1.3)  for  which  (1.3)  has  a  unioue  invariant  measure  p8(),  and  such  that 
u8(.)  £  PM6  for  small  e. 

C3.  For  the  u8(  )  in  (C2),  (x£(u8,t),  t  <  ®,  e  >  0}  is  tight  in  Rr. 

C4. 


inf  y(u)  =  inf  y(m). 
u  €  PM  m  £  SR 

Theorem  8  says  that  if  u8(  )  is  a  8-optimal  control  for  the  diffusion,  then 
its  use  with  the  x£()  gives  a  nearly  (38-optimal)  result  for  small  e. 

Theorem  8.  Assume  (A1)-(A3)  and  (C1)-(C4).  Then  for  each  8  >  0,  and  small 

e, 

(6.2)  ye(u8)  i  inf  ,y€(u)  +  38. 
ut  pm£ 

Proof.  Fix  8  >  0.  u8()  will  be  the  function  defined  in  (C2),  and  u€,8(  ) 

will  be  the  function  defined  in  (Cl).  Let  P£,8(x,t,t,  )  denote  the  transition 


function  for  the  Markov-Feller  process  (xe(  ),  te(-)),  under  the  control  u€,8(). 
Define  the  measures 

T 

pt8(  )  =  4-e  f  P€,8(x€(0),t6(0),t,-)dt, 

T  Jo 

where  the  average  E  is  over  the  possibly  random  initial  condition  (x€(0), 
t€(0)).  Then 

(6.3)  y€(ue>6)  =  lim  |  P|,6(dx  x  dOMx,ue,8(x,0)- 

Let  t€(t)  take  values  in  Rk,  and  let  M(0)  denote  the  set  of  probability 
measures  on  Rr_tk  with  the  weak  topology.  By  (Cl),  the  set  of  M(0)-vaIued 
measures  {P£’6(-),  T  <  ®)  is  in  a  compact  set  in  M(0).  It  follows  from  Bene^ 
[11]  that  the  limit  of  any  weakly  convergent  (in  the  topology  of  M(0)) 
subsequence  is  an  invariant  measure  for  (x6(.),  t6(-))>  with  the  control  u6,8(-) 
used. 

Let  Tn  -*  o  be  a  sequence  such  that  it  yields  the  1  i  m  T  in  (6.3)  and  also 
P£’8(.)  converges  weakly  to  an  invariant  measure  pe,8()  for  (x6(.),  t6(  ))- 

n 

Thus 

y€(u€,s)  =  |  k(x,u€’8(x,0)|te,6(dx  x  d£)- 

Let  (x€(  ),  te(.))  denote  a  stationary  process  corresponding  to  the  invariant 
measure  p€,8(  ). 

Write  the  control  u€,8(  )  for  (xe(-)»  t6(-))  in  the  form  of  a  relaxed 
control,  which  we  call  m€,8(-)>  with  derivative  m*,8().  Let  m*’8  denote  the 
measure  valued  process  which  is  the  time  derivative  of  m€,8(-  x[0,t]).  Then 
the  pair  (state,  relaxed  control  derivative)  of  processes  (xe(-),m  *’8)  is 
stationary.  Alternatively,  for  any  sequence  {tj>  and  set  of  increasing  numbers 
(Sj),  the  distributions  of  (x€(t+tj),  me’8( . x[Sj+t,  sj+1+t]),i,j)  do  not  depend  on  t. 
By  the  stationarity,  we  can  write 


-27- 


1 


ue’8)  =  E  £  dt  J  k(x€(t),a)mt6'8(da). 


(6.4)  y€( 


By  (Cl),  the  collection  of  invariant  measures  (|ie,8(),  e  >  0}  lies  in  a  compact 
set  in  M(0).  Thus,  by  Theorem  5,  (x€(  ■  ),me,8(  •)}  is  tight  in  Dr[0,®)  x  M(®).  Let 
c  index  a  weakly  convergent  subsequence  with  limit  (x(.),m8(  )).  The  limit  is 
of  the  form  (4.2),  with  the  admissible  m8(  )  replacing  the  n(.)  there.  Let  m8 
denote  the  measure-valued  process  which  is  the  time  derivative  of  m8(.  x 
[0,t]).  By  the  stationarity  of  (x£(-),  mf’8),  the  limit  pair  (state,  relaxed  control 
derivative),  (x(  -),  m8)  is  also  stationary,  and  by  the  weak  convergence 


(6.5)  y 


■(u£’8)  -  E  J  dt  |  k(x(t),a)m^da). 


Owing  to  the  stationarity  of  (x(.),m8),  the  right  side  of  (6.5)  equals 

(6.6)  y(m8)  =  lim  —  E  J  dt  J  k(x(t),a)mt^d<x). 

We  now  apply  u8(  )  to  (x€(  ),  £*(•)).  Define  P£’8()  as  P£’8(.)  was 
defined,  but  with  (x€(u8, •),  $*(•))  used.  Choose  Tn  -*  ®  such  that  P£’8(.)  => 

n 

|i€’8( - ),  an  invariant  measure  for  (x€(u8,.),  te(  )),  and  such  that 


y€(ub) 


=  lim  f  P£'8 
n  J  ln 


(dx  x  dOk(x,ub(x)). 


Let  (xf(  ),£€(  ))  denote  the  stationary  process  corresponding  to  the  invariant 
measure  jic,8(  )  and  control  u8(  ). 

By  (C3),  (ii6’8(.),  €  >  0}  lies  in  a  compact  set  in  M(0).  Then,  by  Theorem 
5,  (xe(  ))  is  tight  in  Dr[0,®).  Let  c  index  a  weakly  convergent  subsequence 
with  limit  x(  ),  and  control  u8(  ).  Then  x(  )  is  stationary  and  is,  in  fact,  the 
unique  stationary  process  of  the  form  (1.3)  corresponding  to  the  control  u8(  ). 
We  have,  by  Theorem  5, 


.  •  .  •  -  •  *>  >  >j.  »  *  -  ■ 


(6.7)  y£(u8)  =  E  k(x£(t),u8(x£(t)))dt  -  E 

Jo  Jo 


k(x(t),u8(x(t)))dt  =  y(u8). 

Also  by  the  definition  of  u8(-)  and  (C4), 

y(u8)  i  inf  y(u)  +  6, 
ut  PM 

(6.8) 

inf  y(u)  =  inf  y(m)  <  y(m8). 
uE  PM  m  £  SR 

The  Theorem  follows  from  inequalities  (6.8)  and  the  convergence  in  (6.5)  to 
(6.7).  Q.E.D. 
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7.  On  (Cl)  -  (C4) 

7.1.  Consider  (C4)  first.  Let  there  be  an  optimal  (average  cost  per  unit 
time)  policy  u( - )  in  PM  for  (1.3)  and  such  that  the  associated  diffusion  x(  ) 

has  a  unique  invariant  measure  which  we  denote  by  jiu(-).  Let  the  potential 

C(x)  =  f  Ex[k(x(s),u(x(s)))  -  y]ds 
•*0 

and  constant  y  satisfy  the  Bellman  equation 

(7.1)  y  =  min  (LuC(x)  +  k(x,u)]. 

u  €  u 

See  [12]  for  one  set  of  conditions  guaranteeing  this.  Let  m(  )  E  SR,  with  the 
associated  stationary  process  x(m,-)  =  xm(  )  and  stationary  measure  pm(-)> 
where  xm(  )  satisfies  (4.2)  for  n(  )  =  m().  Suppose  that  for  any  such  m(-) 
with  finite  y(m), 

(7.2)  ||C(x)|um(dx)  <  «. 

Then  (7.1)  implies  that  for  any  T  <  ® 

T 

y  T  <  EC(xm(T))  -  EC (xm(0))  +  E  |  |  k(xm(t),«)rnt(da). 

Then,  by  the  stationarity  of  xm(  ),  y  <  y(m),  and  (C4)  holds.  A  sufficient 
condition  for  (7.2)  will  be  given  in  Subsection  7.3  below. 
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7 JL  On  Condition  (C2). 

We  use  results  from  [13],  where  the  system  %  =  b(x,u)  was  assumed  to  have 
a  stability  property,  uniformly  in  u(  )  PM.  Write  b(x,u)  =  B(x)  +  6(x,u), 
where  B(.)  and  B(  )  satisfy  the  conditions  on  b(  )  in  (A2),  and  B(  )  and  o(-) 
are  bounded,  k(-,  )  is  bounded  and  continuous,  and  (a^x)}  is  uniformly 
positive  definite  and  satisfies  (A3).  The  model  is  such  that  the  stabilizing 
effects  of  B()  overpower  the  effects  of  B(x,u)  for  large  |x|.  This,  together 
with  the  positive  definitiveness,  will  esentially  guarantee  (C4).  To  quantify 
the  stability  property  for  large  |x|,  let  there  be  a  twice  continuously 
differentiable  function  V(-)  such  that  0  <  V(x)  -»  ®  as  |x|  -»  ®  and,  for  some 
compact  set  K  and  0  >  0,  LuV(x)  «  -0,  for  x  £  K  and  all  u(  )  £  PM.  (Lu  is  the 
differential  generator  of  (1.3).)  Let  there  be  c  >  0,  a  >  0,  q(x)  »  0  such  that 
LuV2(x)  i  c-q(x),  where  infxq(x)/V(x)  »  a  Typically  V(  )  would  be  a 
Liapunov  function  for  the  system  *  =  B(x);  e.g.,  if  B(x)  =  Ax  where  A  is  stable 
and  for  Q  >  0,  P  can  be  defined  by  A'P  +  PA  =  -Q,  and  we  use  the  Liapunov 
function  x'Px  =  V(x).  Note  that  our  c  and  V(x)  are  called  c2  and  Wt(x)  in 
[13]. 

Under  the  above  conditions.  Theorems  3.1,  4.2,  4.3  and  the  proof  of 
Theorem  4.4  of  [13]  imply  the  following  facts:  To  any  u(  )  €  PM,  there  is  a 
unique  invariant  measure  jiu(-)  for  (1.3)  and  (nu(-),u(-)  £  PM)  is  in  a  compact 
set  in  M(0);  let  u®(  •)  be  a  8/2-optimal  control  in  PM,  smooth  or  not,  and  let 

(7.3)  un(x)  -  u8(x)  in  LjdU),  un(.)  PM. 

n  6 

Then  for  each  Borel  set  A,  nu  (A)  -»  nu  (A)  and 


|  k(x,un(x))4u"(dx)  -  |k(x,u6(x))4u8( 


(x))(xu  (dx). 


These  facts  imply  that  for  any  given  8/2-optimal  u8(  ),  there  is  a  locally 
Lipschitz  continuous  u6(  )  such  that 


to 


I'fc'***!*  •  •  • 
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y(u8)  -  y(u5)  <  6/2. 

Reference  [13]  uses  a  convexity  condition  ((A3)  there)  on  the  set  (b(x,U), 
k(x,U)}  and  on  U.  But,  this  convexity  condition  was  used  only  to  prove  the 
existence  of  an  optimal  control.  The  8/2-optiml  control  always  exists. 


7.3.  On  the  Assumption  (7.2) 

Again,  we  use  results  of  [13].  Let  C(  )  satisfy  (7.1)  and  assume  the 
conditions  of  Subsection  7.2.  Then,  [13,  proof  of  Lemma  5.1], 

|C(x)|  <  K(1  +  V(x)), 


for  some  K  <  ®  (our  C(x)  is  called  Vu(x)  in  [13]).  Adapting  the  proof  of  [13, 
Lemma  5.1]  to  our  'relaxed'  control  case  and  using  the  c  and  a  of  Subsection 
7.2,  we  get  for  any  M  <  ®  and  relaxed  control  m(-). 

c  l  lim  f  oE  min[M,V(xm(s))]ds 

J0 

By  the  stationarity,  the  integral  equals  aE  min[M,V(xm(0))].  Since  M  is 
arbitrary  and  c  does  not  depend  on  m(-),  (7.2)  holds. 

On  (Cl),  (C3) 

Under  a  suitable  stability  condition  on  the  limit  system  x(-),  both  (Cl) 
and  (C3)  can  be  shown  via  a  perturbed  Liapunov  function  method.  In 
particular,  we  use  some  of  the  results  of  [3,  Chapter  6.6]  and  [14].  We  use  the 
form  b(x,u)  =  B(x)  +  B(x,u)  and 


(7.5) 


x£  =  B(x)  +  B(x,u)  +  b(x,(€)  +  g(x,(e)/£ 


and  (A2),  (A3),  (Ala).  Assume  that  B(-)  and  B(-)  satisfy  the  conditions  on 
b(  )  in  (A2).  Analogous  results  can  be  obtained  under  (Alb),  via  the  method 
in  [3,  Chapter  6.8],  We  require  the  existence  of  a  Liapunov  function  V(.) 
satisfying  certain  inequalities.  In  applications,  the  assumptions  are  essentially 
equivalent  to  B(  )  strongly  dominating  the  effects  of  the  other  terms  for  large 
|x|. 

We  begin  with  an  adaptation  of  a  perturbed  Liapunov  function  method  of 
[14],  but  with  a  simpler  perturbation.  Let  V(.)  be  a  twice  continuously 
differentiable  non-negative  function  such  that  V(x)  -»  ®  as  |x|  -*  ®  and  (Dl)- 
(D4)  hold.  The  K  below  are  constants. 

Dl.  There  are  a  >  0,  c  <  ®,  such  that 

Vj[(x)B(x)  <  -ctV(x)  +  c  and  |VJJ(x)§(x,u)|/V(x)  -»  0  as  |x|  -  ®. 

D2,  |V3|(x)g(x,OI  +  |V,[(x)b(x,OI  <  K(1  +  V(x)) 

ELL  l(v3|(x)q(x))J|p(x)|  <  K(1  +  V(x)),  for  the  pairs 

q(-)  =  b(  -),  p(  )  *  B(  ),  B(  ),  b(  )  and  g(-),  and 
q(-)  =  g(),  P(-)  =  B(  • ),  k •),  b(  •). 

El-  ltVJ|(x)g(x,4)]JJg(x,OI/V(x)  -  0  ai  |x|  -  ®. 

Define  V*(t)  =  V*(x€(t),t),  where 

(7.6)  Vf(x,t)  =  |  V,[(x)E* b(x,t€(s))ds  +  -  [  Vj;(x)Et%(x,te(s))ds. 

6  Jt 

By  a  change  of  scale  s/e2  -»  s  and  (Ala),  (D2),  we  get  that  the  first  term  is 
0(e2)[l  +  V(x)]  and  the  second  is  0(e)[l  +  V(x)].  Define  the  perturbed 
Liapunov  function  V€(t)  =  V(x£(t))  +  V£(t).  Then  (write  x  for  x£(t)  and 


xe  for  xe(t),  where  convenient) 


A6 V(x)  =  V,|(x)[B(x)  +  B(x,u)  +  b(x,i£(t))  +  g(x,4€(t))/€)  , 

A£  V£(x,t)  =  -V,J(x)b(x,4£(t))  -  -^V3J(x)g(x,t£(t)) 

+  f  dsJV^xJE*  b(x,4£(s))]K'x  £ 

J  t 

1  f“ 

+  -  dsIV^xJE^x.l^s))]^  x£  . 

€  Jt 

By  using  the  scale  change  s/t2  -*  s,  (Ala)  and  (Dl)  to  (D4),  we  get  that 
there  is  a  function  h(x)  >  0  such  that  h(x)/V(x)  -  0  as  |x|  -  ®  and  such  that 

(7.7)  A£ V£(t)  <  -aV(x£(t))  +  h(x£(t)). 

By  the  bound  on  V£(x,t)  below  (7.6),  we  can  write  (for  small  c  >  0) 

(7.8)  A£V£(t)  <  -  |  Ve(xe(t))  +  cv 

for  some  Cj  <  ®.  Inequality  (7.8)  yields,  for  some  c2  <  ®, 

(7.9)  EV£(t)  <  e-«t/2EV€(0)  +  c2. 

Now  use  the  bound  on  V£(x,0)  obtained  from  the  estimates  below  (7.6)  to 
get  that  (for  some  e0  >  0) 

sup  EV(x£(t))  <  ®, 

€0>£.t 

which  yields  (Cl)  and  (C3). 

By  using  the  method  and  conditions  in  [3,  Chapter  6.8],  the  conditions 


(D 1  )-(D4)  can  be  weakened.  In  particular,  Vj|(x)B(x)  <  -aV(x)  +  c  can  be 
replaced  by  the  condition  that  V^(x)B(x)  «  -a  <  0  for  large  |x|,  and  some  a  >  0. 

8.  Extensions 

Extensions  of  the  results  in  Sections  4  to  6  to  all  the  standard  control 
problem  formulations  are  quite  possible.  Here,  we  mention  only  a  few 
possibilities. 

8.1.  Stopping  Times 

Let  G  be  a  bounded  open  set  with  a  piecewise  differentiable  boundary, 
and  define 

fT€(m)  - 

R€(m)  =  E  J  dsj  k(x€(s),a)mg(da), 

T*(m)  =  inf (t:  x£(t)  %  G), 

where  x€(  )  is  the  solution  to  (4.1)  which  corresponds  to  m.  Define  R(m),  the 
cost  for  (3. 1  )^in  a  similar  way,  with  -r(m)  =  inf {t:  x(t)  £  G). 

In  extending  Theorem  5  to  this  case,  only  two  problems  arise.  First,  is 
supeExT€(m£)  <  <*>  for  the  various  sequences  (m€(-)}  which  are  used?  Second, 
if  (x€(  ),  me(-))  =>  (x(.),m(  )),  do  the  exit  times  also  converge?  The  answers 
arc  affirmative  under  broad  conditions,  certainly  if  {ajj(x))  is  uniformly 
positive  definite  in  G.  We  discuss  the  questions  in  the  simple  case  where 
le(-)  is  Markov  and  bounded. 

Suppose  that  there  are  6  >  0  and  p  >  0  such  that 

(8.1)  i^f  Px{x(m,t)£  Ng(G),  some  t  <  T)  >  p, 

m€  RC 

where  Ng(G)  is  a  8-neighborhood  of  G  and  Px  denotes  the  probability  given 
the  initial  condition  x.  Then  it  follows  that  there  is  a  pj  >  0  such  that  for  any 


sequence  of  m£()  £  RC, 


(8.2)  Hm  Px  ^{x£(m£,t)  £  G,  some  t  S  2T}  py 


where  Px  ^  denotes  the  probability  given  the  initial  conditions  x,c 

Suppose  that  (8.2)  is  false.  Then  there  are  6  -»  0,  and  (bounded)  initial 
conditions  xf  £  G  and  such  that 

(8.3)  lim  Pv  f  (x£(m£,t)  £  G,  some  t  <  2T}  =  0. 

£  £’  *•£ 

There  is  a  subsequence  (indexed  by  e)  and  m(-)  £  RC  such  that 
{x£(m£,  •  ),m£(  • )}  =>  {x(m,  •  ),m(  • )}.  Then  (8.3)  is  contradicted  by  (8.1).  It 
follows  from  (8.2)  that  there  is  an  e0  >  0  such  that 

sup  E  ,  T£(m)  <  ®. 

£0>0  x'< 


In  the  non-degenerate  case,  if  {x£(m£,  ■  ),m£(  ■ )}  =>  (x(m, .  ),m(  ■ )),  then  the 
exit  times  also  converge.  This  follows  from  the  weak  convergence  and  the 
fact  that  x(m,-)  crosses  the  boundary  of  G  infinitely  often  in  [-r(m),  r(m)+A], 
for  any  A  >  0. 

822.  State  Dependent  Noise 

The  results  of  Sections  4  to  6  can  be  extended  to  the  case  where  the 
evolution  of  £*(•)  depends  on  x£(-)  or  (t£)  depends  on  {x£}.  The  technique  is 
a  combination  of  the  control  'representation'  results  of  this  paper,  and  the 
weak  convergence  methods  of  the  (state  dependent  noise  or  singular 
perturbations  sections  of  [3]).  The  main  problems  concern,  as  before,  tightness 
and  the  representation  of  the  limit  as  a  particular  control  problem. 


S  J 
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